The Twofish Encryption Algorithm: A 128-Bit Block Cipher The Twofish Encryption Algorithm: A 128-Bit Block Cipher
by Bruce Schneier ; John Kelsey ; Doug Whiting ; David Wagner ; Chris Hall ; Niels Ferguson
Wiley Computer Publishing, John Wiley & Sons, Inc.
ISBN: 0471353817   Pub Date: 03/01/99
  

Previous Table of Contents Next


Let us assume that a specific differential has a probability p. If we try the input difference n times, we expect to find this differential around p · n times. The number of times this differential is actually observed is binomially distributed. Let X be a stochastic variable that represents the number of times the differential is observed. We have

From this distribution we can derive a bound on the lower tail of the binomial distribution [Fer98a]:

for kp · n. Given a probability p for the differential, we can say that we have an unlikely event if the differential occurs k times and Pr(Xk) < γ where γ would be a small number. This is a normal test for statistical significance.

We use the following rule to derive a bound on a differential that occurs k out of n times. We use a probability p such that Pr(Xk) < γ for some global parameter γ (typical values for γ are 0.05 or 0.01). Of course, we try to choose p as low as possible given this condition. This will overestimate p for most differentials, but underestimate the actual p in a few cases.

We ran these simulations for all input differences with a low enough number of active S-boxes. For every differential that we tried, we estimated the probability using this rule. For each set of differentials with the same input and output characterization, we computed the maximum estimated probability. For each differential, there is a small chance that we underestimated the probability. However, it is far less likely that we underestimated the maximum probability of a set of differentials. For our maximum to be too low we have to underestimate the probability of the most likely differential. This by itself is rather unlikely. Not only that, we cannot significantly overestimate the probability of any differential with a probability close to the most likely differential. We therefore feel confident that these approximations are reasonable, and that they most likely will result in our overestimating the actual differential probability quite significantly.

For differentials with too many active S-boxes (for which we did not run the simulations), we simply use an upper bound of 1 on the probability of a differential of G.

To improve efficiency, we generate our input differences using a straightforward structure. This improves our performance and allows us to increase the number of samples that we make. However, the differentials that we try are no longer independent of each other. We have observed that the use of structures significantly increases the peaks in the bounds. The smaller the structures that we use, the lower the maximum probability bound tends to be. Therefore, we try to reduce the use of structures. We hope our next software version will allow us to eliminate structures altogether.

Apart from these numerical results, we know that certain differential patterns cannot occur. For example, if the input difference is restricted to the first input word of G, then the output difference must have active bits in both output words. Similarly, if the input difference is restricted to the second input word of G, then the output difference must have active bits in both halves, except when the output of the MDS matrix has a difference of 0x80000000. In this special case, we know that all four S-boxes in this half must be active (otherwise more than one byte in the output of the MDS matrix must change). Our software generates all these impossible differential patterns and sets the differential probabilities of the associated sets to zero.

Prob. Differential
2-20 x000 0x00 → 0000 xxxx
2-26 x000 0x00 → 0000 0xxx
2-22 x000 0x00 → xxxx 0000
2-26 000x x000 → 0000 xxxx
2-22 000x x000 → xxxx 0000
2-20 00x0 000x → 0000 xxxx
2-22 00x0 000x → xxxx 0000
2-14 0x00 0000 → 0x0x 0x0x

Table 9.8. Experimentally Found Differentials of F

Differentials of F. Given the results from the last section, we can now create a table of upper bounds on the differential probabilities of differentials of F. For each set of differentials we know how many active S-boxes there are. Let σ be the maximum probability of a differential of an S-box. We can now bound the probability of each set of differentials by multiplying the bounds that we found in the previous section on the set by the proper power of σ.

The value σ can be set in various ways. We know that most S-boxes have a best differential probability of 12/256. For the time being we will use this value for σ. Other values, especially larger ones, will be discussed later.

Experimental Results. We ran a test for differential characteristics of the F function with one active byte in each 4-byte g input and at most four active bytes at the 8-byte F output. Table 9.8 shows some interesting differentials that we found. The probabilities are very approximate, as they derive from numerical experiments with not too many samples. Note that in all but the last of these, the two active input bytes are going through the same key-dependent S-box due to the 8-bit rotate at the input of the second g function. The leading cause of all of these differentials seems to be that these two bytes generate the same output differential from the S-box, and thus the same differential at the output of the MDS matrix. With some positive probability, the PHT maps this into an output differential where one half is zero. This probability depends on the bit pattern of the differential just before the PHT.

The last characteristic has a different explanation. The input differential results in a differential at the output of the MDS matrix that has a relatively high chance of being converted into a 0x0x pattern after a constant is added to it. For example, the differential 03f2037a16 has a fair chance of being converted to a 0x0x pattern after a 32-bit addition.

All of these characteristics have a relatively low probability. We have not found any attack that directly relates to this property, but it provides a useful starting point for further research. In particular, upper bounds on the differential characteristics of F of this type could be used to improve our bound on the best differential characteristic for the full cipher.

9.4.3 Differentials of the Round Function

Once we have derived bounds on the differentials of F, we can do the same for the round function. The differential pattern at the start of the round is characterized by 16 bits; each bit indicates whether the differential pattern in the corresponding byte is non-zero. Given the characterization of the differential pattern at the input of the round, we know exactly which S-boxes are active. We can generate a list of all suitable differential patterns of F with their associated probability bound. Each of these differentials is combined with the other half of the input differential using the rotate and XOR operations. Each choice of the F differential set leads to several possible output differential patterns as the rotate and XORs can lead to different output characterizations depending on the exact differential.

For example, let us look at a differential pattern of 0110 in a 32-bit word. This pattern indicates that only the middle two bytes of the 4-byte word contain active differential bits. After a left rotation, the possible output differential patterns are: 0100, 0110, 1010, 1100, and 1110. XORing two differential patterns can similarly lead to a list of possible results. If we XOR two words, one with a differential pattern of 0101 and one with a differential pattern of 0011, then the possible result patterns are 0110 and 0111.

For each input differential pattern to the round, we can go through all possible F differential patterns, and generate all possible output patterns that can arise. For each possible output pattern of the round, we keep track of the largest upper bound that we generate this way. This produces an upper bound for each of the 232 possible input/output differential patterns of the round function.

9.4.4 Multi-round Patterns

The simplest way of generating multi-round patterns would be to use the list of 232 possible round patterns and a standard search algorithm. We use an algorithm that is somewhat more efficient than that. There are 216 possible differential patterns after r rounds, as each of the 16 data bytes can have a zero or non-zero difference. For each of the 216 possible patterns, we store an upper bound on the probability of any characteristic that has this difference pattern after r rounds. Furthermore, we store the list of differential patterns of F, and a precomputed table of how the rotates and XORs can propagate patterns. For each difference pattern after r + 1 rounds, we use this data to compute an upper bound on the probability of a differential characteristic that has this pattern after r + 1 rounds.

Given the output pattern of the round in question, we know the first half of the input pattern. This leaves us with 256 possible differential input patterns, and 256 possible differential patterns of F. Each of the 216 possible combinations is tried to see whether it can yield the required output differential pattern. The process can be speeded up by traversing either the F output patterns or the input patterns in decreasing order of probability and using some simple cut-off logic.

9.4.5 Results

The results depend on the parameters used to estimate the differential probabilities of G, and the values of γ and σ.


Previous Table of Contents Next